AITopics | faster rate

Fast Instrument Learning with Faster Rates

Neural Information Processing SystemsDec-24-2025, 09:37:36 GMT

We investigate nonlinear instrumental variable (IV) regression given high-dimensional instruments. We propose a simple algorithm which combines kernelized IV methods and an arbitrary, adaptive regression algorithm, accessed as a black box. Our algorithm enjoys faster-rate convergence and adapts to the dimensionality of informative latent features, while avoiding an expensive minimax optimization procedure, which has been necessary to establish similar guarantees. It further brings the benefit of flexible machine learning models to quasi-Bayesian uncertainty quantification, likelihood-based model selection, and model averaging. Simulation studies demonstrate the competitive performance of our method.

fast instrument learning, faster rate, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.63)

Add feedback

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

Neural Information Processing SystemsDec-24-2025, 07:27:20 GMT

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent [31, 32] and Adam [14]. In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method based on ANITA [20] for distributed optimization, which we call CANITA.

convex optimization, frac, omega, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

Neural Information Processing SystemsDec-24-2025, 02:07:16 GMT

In this paper, we propose an accelerated quasi-Newton proximal extragradient method for solving unconstrained smooth convex optimization problems. With access only to the gradients of the objective, we prove that our method can achieve a convergence rate of $\mathcal{O}\bigl(\min\\{\frac{1}{k^2}, \frac{\sqrt{d\log k}}{k^{2.5}}\\}\bigr)$,

accelerated quasi-newton proximal extragradient, faster rate, smooth convex optimization, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

Neural Information Processing SystemsDec-23-2025, 23:15:53 GMT

We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points \emph{without replacement} leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely \emph{Random Reshuffling} (RR), which shuffles the data every epoch, and \emph{Single Shuffling} or \emph{Shuffle Once} (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-\L{}ojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of \emph{data-ordering attacks}, where an adversary manipulates the order in which data points are supplied to the optimizer. Our analysis also recovers tight rates for the \emph{incremental gradient} method, where the data points are not shuffled at all.

faster rate, finite-sum minimax optimization, replacement lead, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

Soleymani, Ashkan, Piliouras, Georgios, Farina, Gabriele

arXiv.org Artificial IntelligenceMar-31-2025

We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.

artificial intelligence, machine learning, no-regret learning, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3717823.3718242

2503.2434

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-7-2025, 18:58:41 GMT

We thank all the reviewers for taking the time to read and comment on our work. We will use the comments to improve the paper. Below we comment on some specific issues that were raised. R1 These are good points regarding the experiments, we will update the plots following these suggestions. Note that uniform and Lipschitz are the same in some plots because the rows of the data are normalized (Lipschitz can still give improvements here because it depends on the potentially-smaller Lipschitz constant of the deterministic part.)

author feedback and meta-review, dataset, linearly-convergent method, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.34)

Add feedback

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

Neural Information Processing SystemsJan-24-2025, 03:35:03 GMT

In this paper, we propose an accelerated quasi-Newton proximal extragradient method for solving unconstrained smooth convex optimization problems.

accelerated quasi-newton proximal extragradient, mathcal, smooth convex optimization, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Fast Instrument Learning with Faster Rates

Neural Information Processing SystemsOct-11-2024, 11:19:48 GMT

We investigate nonlinear instrumental variable (IV) regression given high-dimensional instruments. We propose a simple algorithm which combines kernelized IV methods and an arbitrary, adaptive regression algorithm, accessed as a black box. Our algorithm enjoys faster-rate convergence and adapts to the dimensionality of informative latent features, while avoiding an expensive minimax optimization procedure, which has been necessary to establish similar guarantees. It further brings the benefit of flexible machine learning models to quasi-Bayesian uncertainty quantification, likelihood-based model selection, and model averaging. Simulation studies demonstrate the competitive performance of our method.

algorithm, fast instrument learning, faster rate

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

Neural Information Processing SystemsOct-11-2024, 04:10:24 GMT

Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent [31, 32] and Adam [14]. In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method based on ANITA [20] for distributed optimization, which we call CANITA. Our results show that as long as the number of devices n is large (often true in distributed/federated learning), or the compression \omega is not very high, CANITA achieves the faster convergence rate O\Big(\sqrt{\frac{L}{\epsilon}}\Big), i.e., the number of communication rounds is O\Big(\sqrt{\frac{L}{\epsilon}}\Big) (vs. As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).

communication compression, frac, omega, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

Neural Information Processing SystemsOct-10-2024, 11:44:41 GMT

We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points \emph{without replacement} leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely \emph{Random Reshuffling} (RR), which shuffles the data every epoch, and \emph{Single Shuffling} or \emph{Shuffle Once} (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-\L{}ojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of \emph{data-ordering attacks}, where an adversary manipulates the order in which data points are supplied to the optimizer.

emph, finite-sum minimax optimization, replacement lead, (5 more...)

Neural Information Processing Systems

Genre: Play > Prospect (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback

Filters

Collaborating Authors

faster rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Fast Instrument Learning with Faster Rates

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

Fast Instrument Learning with Faster Rates

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization